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Abstract 

We propose a novel numerical method for solving inverse problems subject to impulsive noises 
which possibly contain a large number of outliers. The approach is of Bayesian type, and it exploits 
a heavy-tailed t distribution for data noise to achieve robustness with respect to outliers. A hierar- 
chical model with all hyper-parameters automatically determined from the given data is described. 
An algorithm of variational type by minimizing the Kullback-Leibler divergence between the true 
posteriori distribution and a separable approximation is developed. The numerical method is illus- 
trated on several one- and two-dimensional linear and nonlinear inverse problems arising from heat 
conduction, including estimating boundary temperature, heat flux and heat transfer coefficient. The 
results show its robustness to outliers and the fast and steady convergence of the algorithm, 
key words: impulsive noise, robust Bayesian, variational method, inverse problems 

1 Introduction 

We are interested in Bayesian approaches for inverse problems subject to impulsive noises. Bayesian infer- 
ence provides a principled framework for solving diverse inverse problems, and has demonstrated distinct 
features over deterministic techniques, e.g., Tikhonov regularization. Firstly, it can yield an ensemble of 
plausible solutions consistent with the given data. This enables quantifying the uncertainty of a specific 
solution, e.g., with credible intervals. In contrast, deterministic techniques generally content with sin- 
gling out one solution from the ensemble. Secondly, it provides a flexible regularization since hierarchical 
modeling can partially resolve the nontrivial issue of choosing an appropriate regularization parameter. 
It is known that the underlying mechanism is balancing principle [22]. Thirdly, it allows seamlessly 
integrating structural/multiscale features of the problem through careful prior modeling. Therefore, it 
has attracted attention in a wide variety of applied disciplines, e.g., geophysics [37l|34], medical imaging 
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[H Ell [T] and heat conduction [Ql |39l SQI E], see also [32 [28l [30l |3T] for other apphcations. For an 
overview of methodological developments, we refer to the monographs [37|[26l. 

Amongst existing studies on Bayesian inference for inverse problems, the Gaussian noise model has 
played a predominant role. This is often justified by appealing to central limit theorem. The theorem 
asserts that the normal distribution is a suitable model for data that are formed as the sum of a large 
number of independent components. Even in the absence of such justifications, this model is still preferred 
due to its computational/analytical conveniences, i.e., it allows direct computation of the posterior mean 
and variance and easy exploration of the posterior state space (for linear models with Gaussian priors). A 
well acknowledged limitation of the Gaussian model is its lack of robustness against the outliers, i.e., data 
points that lie far away from the bulk of the data, in the observations: A single aberrant data point can 
significantly influence all the parameters in the model, even for these with little substantive connection 
to the outlying observations [16, pp. 443]. 

However, it is clear that not all real-world data can be adequately described by the Gaussian model. 
For example, Laplacian noises can arise when acquiring certain signals [2 , and salt-and-pepper noises 
are very common in natural images/signals due to faulty memory location and transmission in noisy 
channels [6 . The impulsive nature of these noises is reflected by the heavy tail of their distributions and 
thus the presence of, possibly of a significant amount, outliers in the data. Physically, such noises arise 
from uncertainties in instrument calibration, physical limitations of acquisition devices and experimental 
(operation) conditions. Due to its lack of robustness, an inadvertent adoption of the Gaussian model can 
seriously compromise the accuracy of the estimate, and consequently does not allow full extraction of the 
information provided by the data. 

This calls for methods that are robust to the presence of outliers. There are several ways to derive 
robust estimates. One classical approach is to first identify the outliers with noise detectors, e.g., by 
adaptive media filter, and then to perform inversion/reconstruction on the data set with outliers ex- 
cluded [161 ISj- The success of such procedures relies crucially on the reliability of the noise detector. 
However, it can be highly nontrivial to accurately identify all outliers, especially in high dimensions, and 
mis-identification can adversely affect the quality of subsequent inversion. This necessitates developing 
systematic strategies for handling impulsive noises, which can be achieved by modeling the outliers ex- 
plicitly with a heavy-tailed noise distribution. The Student's t and Laplace distributions are two most 
popular choices. The use of the ^-distribution in robust Bayesian analysis is well recognized, see [29 for 
its usage in statistical contexts. In [12], the application of the EM algorithm to t models was shown. 
Recently, Tipping and Lawrence [38 developed a robust Bayesian interpolation with the t-distribution. 
Alternatively, the Laplace distribution may be employed, see, e.g., [15^ for robust probabilistic principal 
component analysis. 

This paper studies the potentials of one robust Bayesian formulation for inverse problems subject to 
impulsive noises. Impulsive noise has received some recent attention in deterministic inversion, see [TOllQ] 
and references therein, but not in the Bayesian framework. The salient features of the proposed approach 
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include uncertainty quantification of the computed solution, robustness to data outliers, and general 
applicability to both linear and nonlinear inverse problems. Therefore, it complements the developments 
of robust formulations in the framework of deterministic inverse problems [TOl |9] . As to the numerical 
exploration of the Bayesian model, we capitalize on the variational method developed in machine learning 
[25l[3l[4] for approximate inference, and thus achieve reasonable computational efficiency. The application 
of variational Bayesian formulations to inverse problems, especially nonlinear ones, is of relatively recent 
origin [35[[23], and their robust counterparts seem largely unexplored. 

The rest of the paper is structured as follows. A hierarchical formulation based on the t distribution 
for the noise is derived in Section [2] The variational method for numerically exploring the posterior 
is described in Section |3j and two algorithms are developed for linear and nonlinear inverse problems, 
respectively. In Section [4j numerical results for four benchmark inverse problems arising in heat transfer 
are presented to illustrate the features of the formulation and the convergence behavior of the algorithms. 

2 Hierarchical Bayesian inference 

In this section, we formulate the hierarchical Bayesian model for inverse problems subject to impulsive 
noises. The focus is on the noise model and hyper-parameter treatment. 
We consider for the following finite-dimensional linear inverse problem 



where K : R^, u G and y G represent the (possibly nonlinear) forward model, the 

sought-for solution and noisy observational data, respectively. 

In Bayesian formalism, the likelihood function p{y\u) incorporates the information contained in the 
data y, and it is dictated by the noise model. Let the given data y be subjected to additive noises, i.e.. 



where ^ G M"^ is a random vector corrupting the exact data y"*". In practice, a Gaussian distribution on 
each component Q is customarily assumed. The validity of this assumption relies crucially on being not 
heavy-tailed and the symmetry of the distribution, and the violation of either condition may render the 
resulting Bayesian model invalid and inappropriate, which may seriously compromise the accuracy of the 
posteriori estimate. 

In practice, data outliers can arise due to, e.g., erroneous recording and transmission in noisy channels, 
which makes the Gaussian model unsuitable. Following the interesting works [29l [171 EH] , we choose to 
model the outliers explicitly by a heavy-tailed distribution, and this yields a seamless and systematic 
framework for treating impulsive noises. We focus on the t model [16 , where the noises Q are independent 
and identically distributed according to a centered t distribution, i.e.. 



K(u) = y, 



(1) 




3 



where is a degree of freedom parameter, a is a scale parameter [16 , and r(-) is the standard Gamma 
function. Consequently, the likelihood function p{y\u) is given by 



where the subscript i denotes the ith entry of a vector. 

In Bayesian formalism, structural prior knowledge about the unknown u is encoded in the prior 
distribution p{u). Here we focus on the following simple random field 

A„. ,,2 



p(u|A)=CA5exp(^--||Lu||^j, (3) 

where || • ||2 denotes the Euclidean norm and C is a normalizing constant. The matrix L G R*^^, whose 
rank is 5, encodes the structural interactions between neighboring sites, and A is a scaling parameter. 

The hyper-parameters a and A in the likelihood p{y\u) and the prior p{u\X) play the crucial role of 
regularization parameters in classical regularization [22]. Bayesian formalism resolves the issue through 
hierarchical modeling and determines them automatically from the data y. A standard practice to select 
priors for hyper-parameters is to use conjugate priors. For the parameter A, the conjugate prior is a 
Gamma distribution G{t]a^j3)^ which is defined by 

G(t;a,/3) = -^t-^e-^^ (4) 

where a and P are nonnegative constants. The parameters v and a do not admit easy conjugate form, 
and one may opt for the maximum likelihood approach when appropriate. 
According to Bayes' rule, the posterior p(u, A|y) is related to the data y by 

/ \|v^ = p(y|u)p(u|A)p(A) 
' S S p{y\M)p{u\\)p{\)dMd\- 

Upon ignoring the (unimportant) normalizing constant p{y) = J J p{y\u)p{u\X)p{X)dudX^ the posterior 
j9(u, A|y) may be simply evaluated as 



where (ao,/3o) is the parameter pair of the Gamma distribution for A. 

The posterior state space p(u, A|y) is often high dimensional, and thus it can only be numerically 
explored. In Section [Sj we shall develop an efficient variational method for its approximate inference. 



3 Variational approximation 

In this section, we describe a variational method for efficiently constructing an approximation to the 
posterior distribution ([5|. It can deliver point estimates together with uncertainties for both the solution 
u and the hyper-parameter A. There are three major obstacles in getting a faithful approximation: 



4 



(a) nongaussian likelihood {t instead of Gaussian distribution), 

(b) statistical dependency between u and A, and 

(c) possible nonlinearity of the forward mapping K. 

To circumvent these obstacles, we shall make use of three ideas: scale-mixture representation of the t 
distribution, variational (separable) approximation for decoupling dependency, and recursive linearization 
for resolving nonlinearity. 

First we describe the scale- mixture representation. In the posterior ([5|, the t likelihood makes it hard 
to find or define a good approximation. Fortunately, it can be represented as follows [161 PP- 446] 

, r(^) i , 1 fQ^'^~^~^ 



r(|)v^<j I u \a 

I Wi _^a2 

where the density p(wi) is given by p{wi;iy^a) = ^ r^^^^ w^~^e~^^ = G^k;^;!,^^, c.f. Q. To 
simplify the expression, we introduce two independent variables ai and Pi by 

ai = - and Pi = 

and work with the parameters ai and Pi hereon. Then we have the following succinct formula 

roo I 

p{CiWi,Pi)= I J -^e~^^^p{wi]ai,j3i)dwi, 

Jo V ^TT 

with p{wi;ai^ Pi) = G {wi;ai^ Pi). Therefore, the t model is a mixture (average) of an infinite number 
of Gaussians of varying precisions with the mixture weight Wi specified by the Gamma distribution 
p{wi; ai^ Pi). The representation also explains its heavy tail: for small Wi^ the random variable d follow 
a Gaussian distribution with a large variance, and thus the realizations are likely to take large values, 
which behaves more or less like outliers. By means of scale mixture, we have introduced an extra variable, 
but effectively converted a t distribution into a Gaussian distribution. In sum, we have arrived at the 
following augmented posterior 

p(u, w, A|y) oc |W| ie-^ll^("^-yllw . p(w; ai.pi) • A^e-^H^^H' • A"°-^e-^°^, (6) 

where w G is an auxiliary random vector following the Gamma distribution, i.e., 

n 

p(w;ai,/3i) = ]^G(^i;ai,/3i) = G(w;ai,/3i), 

i=l 

W is a diagonal matrix with diagonal w, and the weighted norm || • ||w is defined by ||v||^ = v^Wv. 
The posterior p(u, w, A|y) is computationally more amenable with the variational method. 

Next we describe the variational method for approximately exploring the posterior (|6| in case of a 
linear operator K, i.e., K(u) = Ku. The derivations here follow closely [23l[38]. An approximation can 
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be derived as follows. One first transforms it into an equivalent optimization problem using the Kullback- 
Leibler divergence and then obtains an approximation by solving the optimization problem inexactly. The 
divergence w, A)|p(u, w, A|y)) between two densities g(u, w,A) and p(u, w,A|y) is defined by 

^kl(<?(u, w, A)|p(u, w, A|y)) = / / / (7(u,w,A)ln J^^'^['^\ dudwdX ^\ogp{y), 

J J J P(u,w,A,y) 

where p{y) is a normalizing constant. Since the divergence Dkl is nonnegative and vanishes if and only 

if q coincides with minimizing Dk l effectively transforms the problem into an equivalent optimization 

problem. We shall minimize the following functional, which is also denoted by Dkl 

^kl(<7(u, w, A)|p(u, w, A|y)) = / / / (7(u,w, A)ln ^^^'^[^\ dudwdX. (7) 

J J J Pi^, w,A,yj 

However, directly minimizing Dkl is still intractable since the posterior p(u, w,A|y) is not avail- 
able in closed form. We impose a separability (conditionally independence) condition for the posterior 
distributions of u, w and A to arrive at a tractable approximation, i.e., 

^(u, w. A) q{u)q{w)q{X). (8) 

Algorithm 1 Variational approximation for linear models K 
1: Set initial guess g^(w) and q^{X); 
2: for k = 1, . . . ,K do 
3: Find q^{u) by 

q^{u) = aTgmmDKL{q{u)q^-\w)q^-\X)\p{u,w,X\y))', 

g(u) 



4: Find q^{w) by 



5: Find q^{X) by 



^^(w) = argniinL>KL(^^u)^(w)g^ ^(A)|p(u, w, A|y)); 

g(w) 



q^{X) = argminL)KL(^^(u)^^(w)g(A)|p(u,w, A|y)); 



6: Check a stopping criterion; 
7: end for 

8: Return approximation q^{u)q^{-w)q^{X). 



Under condition ([8|, an approximation can be computed by Algorithm [T] Each step of the algorithm 
can be further developed as follows. Setting the first variation of Dkl with respect to q{u) to zero gives 

q^{u) (X exp (£;gfc-i(^)^fc-i(;,)[lnp(u,w, A,y)]) . 

It follows that q^{u) follows a Gaussian distribution with covariance cov^fc^^) mean given by 

coVgfc(u)[u] = [K^WfcK + AfcL^L] ^ and := Eqk^^^^[u\ = coVgfc(u) [u]K^W/ey, 



respectively, where = Eqk-i(^x)[M = ^g'«-i(w) [W], i.e., 

where refers to a normal distribution. Analogously, we can show that q^{-w) and q^{X) take a factorized 
form, i.e., 

5'=(w) = G (w;ai + i, A + i£;,.(„)[|Ku - yf]) , 
q>'{X) = G (A; ao + f , /?o + |i?,-=(u) [||Lu||2]) . 
Thus Steps 4 and 5 involve simply updating their respective parameter pairs. 

There are several viable choices for the stopping criterion at Step 6. In practice, the following two 
heuristics work well. One is to monitor the relative change of the inverse solution u/^. If the change 
between two consecutive iterations falls below a given tolerance tol, i.e., ||u/e — u/e_i ||2/||u/e||2 < tol, 
then the algorithm may stop. Another is to monitor the variable A^. Numerically, we observe that the 
algorithm converges reasonably fast and steadily. 

We briefly remark on the choice of the pair For our experiments in Section |4j one fixed 

pair = (1, 1 x 10~^^) works fairly well. In principle, it is plausible to estimate them from the 

data simultaneously with other parameters in order to adaptively accommodate noise features, especially 
for large data sets \16^. However, there are no conjugate priors compatible with the adopted variational 
framework [38]. Therefore, one possible way is to maximize the divergence with respect to It is 

easy to find that in ([7|), the only term relates to ai and Pi is given by 

n n 

nai ln/3i -nlnr(ai) + (ai - l)^Eq. ^^^^[In Wi] - f3i^Eq*^^^^[wi]. 

i=l i=l 

Taking derivatives with respect to ai and we arrive at 

1 

ln/3i - V^(ai) + -^Eq*^^^^[\nwi] = 0, 

i=l 

Pi n .^^ 

where tl^{s) = ^lnr(5) denotes the digamma function. The solution to the system is not available in 
closed form, but upon eliminating the variable it can be solved efficiently by the Newton- Raphson 
method, see e.g., [8, Sect. 3.1]. 

Finally, we briefiy mention the extension to nonlinear problems via recursive linearization [71 123] . The 
main idea is to approximate the nonlinear model K(u) by its first-order Taylor expansion K(u) around 
the mode u of an approximate posterior, i.e., 

K(u) = K(u) + J(u-u), 

where J = VuK(u) is the Jacobian of the model K with respect to u. With this linearized model K(u) 
in place of K(u), Algorithm fl] might be employed to deliver an approximation. The mode of the this 
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newly- derived approximation is then taken for (hopefully) more accurately capturing the nonlinearity 
of the genuine model K(u). This procedure is repeated until a satisfactory solution is achieved, which 
gives rise to Algorithm [2] In the inner loop, the variational approximation needs not be carried out very 
accurately. As to the stopping criterion at Step 7, there are several choices, e.g., based on the relative 
change of the inverse solution u. 

Algorithm 2 Variational approximation for nonlinear models K 
1: Given initial guess ^^(u), g^(w) and ^^(A), and set k = 0. 
2: repeat 

3: Calculate the mode and the Jacobian = VuK(u^). 

4: Construct the linearized model, i.e. K(u) = K(u^) + J^(u — u^); 

5: Find a variational approximation q^~^^ {u)q^~^^ {w)q^~^^ {X) using K(u) by Algorithm [l] 
6: Setk = k^l; 

7: until A stopping criterion is satisfied 
8: Return g^(u)g^(w)g^(A) as the solution. 



4 Numerical experiments 

Now we illustrate the proposed method on several benchmark inverse problems in Id and 2d heat transfer. 
These examples are adapted from literature [36l |20l [23l [39l |24] , and include both linear and nonlinear 
models. Throughout, the noisy data y are generated as follows 

{yj , with probability 1 — r 

Vi + ^Ci7 with probability r 

where Q follow the standard normal distribution, and (e, r) control the noise pattern: r is the corruption 
percentage and e = max^{|?/||} is the corruption magnitude. The matrix L in the prior p{u\X) is taken 
to be the first-order finite-difference operator, which enforces smoothness on the sought-for solution u. 
The parameter pairs (ao,/3o) and are both set to (1.0, 1.0 x 10~^^). We have also experimented 

with updating but it brings little improvement. Hence, we refrain from presenting the results 

by adaptively updating We shall measure the accuracy of a solution u by the relative error 

e = ||u-ut||2/||ut||2. 

The algorithm is terminated if the relative change of u falls below tol = 1.0 x 10~^. All the computations 
were performed on a dual core personal computer with 1.00 GB RAM with MATLAB version 7.0.1 . 

4.1 Cauchy problem 

This example is taken from [23l Sect. 5.2.1]. Here we consider the Cauchy problem for steady state heat 
conduction. Let Q be the unit square (0, 1)^ with its boundary T divided into two disjoint parts, i.e.. 
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(0,1) 



(1,1) 




(0,1) 



(x=0) 



(1,1) 



(x=1) 



(0,0) 



(1,0) 



(0,0) 



(1,0) 



(t=0) 

(a) (b) 
Figure 1: The domain and boundaries for Example 1 (a) and Example 3 (b). 

Table 1: Numerical results for Example 1 with various noise levels. 



0.1 



0.2 



0.3 



0.4 



0.5 



0.6 



0.7 



0.8 



0.9 



8.56e-l 8.51e-l 8.49e-l 8.49e-l 8.36e-l 8.28e-l 8.28e-l 8.28e-l 8.27e-l 
2.33e-4 3.67e-4 3.65e-4 3.67e-4 1.59e-3 2.49e-3 2.49e-3 2.49e-3 2.53e-3 



= [0, 1] X {1} and Tc = r\r^. The steady-state heat conduction is described by 

-Ay = inQ. 



It is subjected to the boundary conditions 

g on Tc 



dy_ 
dn 



and 



y = u on r^, 



where n denotes the unit outward normal. The linear operator K maps u (with g = 0) to y restricted to 
the segments = {0, 1} x (0, 1) C Fc. We refer to Fig. [TJa) for a schematic plot of the domain and its 
boundaries. The inverse problem seeks the unknown u from noisy data y. It arises, e.g., in the study of 
re-entrant space shuttles [5 and electro-cardiography [11 . For the inversion, the solution y is given by 
siuTTXie^^^ -\- xi from which both g and u can be evaluated directly. The operator K is discretized 

using piecewise linear finite element with 3200 triangular elements, see [23l [24] for details. The number 
of measurements y is 80, and the unknown u is of dimension 41. 

The numerical results for the example with various levels of noises are shown in Table [l] where e refers 
to the relative error. A first observation is that the corruption percentage r plays an important role in 
the error e, and there is a sudden loss of the accuracy e as r increases from 0.4 to 0.5. Nonetheless, the 
reconstruction remains very accurate for r up to 0.9. 

A typical realization of the noisy data of level r = 0.5 is shown in Fig. |2|a). We observe that some 
data points deviate significantly from the exact ones, and are completely erroneous. The solutions (mean 
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Table 2: Numerical results for Example 2 with various noise levels. 



0.1 



0.2 



0.3 



0.4 



0.5 



0.6 



0.7 



0.8 



A 3.29e2 3.20e2 3.04e2 2.97e2 2.86e2 2.71e2 2.55e2 2.44e2 
e 5.51e-3 6.73e-3 8.17e-3 8.28e-3 9.06e-3 9.41e-3 1.79e-2 1.79e-2 



of the Bayesian solution) by the proposed approach {t model) and the Gaussian model is shown in Figs. 
[2|b) and[2|d), respectively, where xi is the first spatial coordinate. Here, for the Gaussian model, the 
regularization parameter rj is manually tuned, i.e. 77 = 4.64, so as to yield a solution with the smallest 
possible error. The solution by the t model is in excellent agreement with the exact one, while that by 
the standard approach is completely off the track. This shows clearly the robustness of the t model. 
The covariance, see Fig. [2|c), can be used for quantifying the uncertainty of a specific solution. The 
covariance is relatively smooth, and decays quickly away from neighboring sites. The weight w admits 
nice interpretations. We plot in Fig. [2je) the noise (solid line, value in blue) and the weight (dashed 
line, value in green). There is a one-to-one correspondence of the noise sites and the sites of the weight 
with small values. Thus the weight w can accurately detect the locations of the noises and effectively 
prunes out the noises from the inversion procedure simultaneously. The convergence of Algorithm [l] is 
very steady and fast, see Fig. [2|f), and it is reached within about ten iterations. The convergence of the 
scalar A seems monotonic. 



4.2 Flux reconstruction 

This example is taken from [39^ Sect. 5.1]. Here, we consider Id transient heat transfer. Let Q be the 
interval (0, 1), and the time interval be [0, 1]. The Id transient heat conduction is described by 

with a zero initial condition and the following boundary conditions 

^ = g(t) on Sc and ^ = u(t) on S^, 
on on 

where the boundaries He = {x = 0} x [0,T] and = {x = 1} x [0,T]. The operator K maps the flux 
u (with g = {)) io y restricted to Ec. The inverse problem is to recover the flux u from noisy data y. 
For the inversion, we take g{t) = and a hat shaped flux ix, see Fig. [sj^b) for its profile. The spatial 
and temporal intervals are discretized into 101 and 201 uniform grids, respectively. The operator K 
is discretized with piecewise linear finite elements in space and backward finite-difference in time. The 
number of measurements y is 50, and the unknown u is on a coarse mesh and of size 51. 

The numerical results for Example 2 are shown in Table |2j The A value is independent of the corrup- 
tion percentage r, and the accuracy e only deteriorates very mildly with the increase of the corruption 
percentage r from 0.1 to 0.8. The solution for a typical realization of noisy data of level r = 0.5 is shown 
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Figure 2: Numerical results for Example 1 with r = 0.5 noise. In (e), the solid and dashed lines refer to 
the noise C the weight w, respectively. 
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Table 3: Numerical results for Example 3 with various noise levels. 



0.1 



0.2 



0.3 



0.4 



0.5 



0.6 



0.7 



0.8 



0.9 



A 1.07e2 1.04e2 1.03e2 1.03e2 1.03e2 1.03e2 1.02e2 1.01e2 9.78el 
e 1.30e-3 1.72e-3 1.71e-3 1.71e-3 1.70e-3 1.70e-3 1.69e-3 1.76e-3 2.27e-3 



in Fig. [3[a), where t is the temporal coordinate. It agrees excellently with the true solution, except 
small errors around the corner. The variance at the end points, especially around t = 1, is much more 
pronounced than that in the interior, see Fig. |3jb). This might be related to the causality nature of heat 
problems. The weight w detects noise sites accurately and meanwhile eliminates them from the inversion 
by putting very small weight, and the convergence of the algorithm is steady and fast, c.f. Figs. |3|c) and 
(d), respectively. 

4.3 Stationary Robin inverse problem 

This example is adapted from [23l Sect. 5.2.4] [20], to illustrate the approach for nonlinear problems. 
Let ft be the unit square (0, 1)^ with its boundary F divided into two disjoint parts, i.e., F^ = [0, 1] x {1} 
and Fc = F\Fi. The steady-state heat conduction is described by 

-Ay = iiiQ. 

It is equipped with the following boundary conditions 

^ = g{x) on Fc and ^ ^ uy = on F^, 
on on 

where u is the heat transfer coefficient. The operator K maps the coefficient u to y restricted to Fc. The 
inverse problem is to reconstruct the unknown u from noisy data y. It arises in corrosion detection [191 [24] 
and analysis of quenching process [33 . For the inversion, the flux g is set to 1, and the true coefficient 
u is given by 1 + sin(7rxi). The operator K is discretized using piecewise linear finite element with 3200 
triangular elements. The number of measurements y is 120, and the unknown u is of dimension 41. 

The numerical results for Example 3 are shown in Table [3| The observations for the linear models 
remain valid. The solution for an exemplary noise realization of level r = 0.5 is shown in Fig. [4]^ a), which 
agrees well with the exact one. The convergence of Algorithm [2] is achieved within four (outer) iterations, 
see Fig. ^d). The convergence behavior of the algorithm in the inner loop is similar to that of linear 
cases. The plateaus indicate that the tolerance tol = 1.0 x 10~^ is a bit too conservative, and the first 
two iterations may be solved less accurately without sacrificing the accuracy of the final solution. 
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(a) mean by t model (b) covariance by t model 




i k 
(c) weight w v.s. noise ^ (d) convergence of Alg. jij 

Figure 4: Numerical results for Example 3 with r = 0.5 noise. In (c), the solid and dashed lines refer to 
the noise and the weight w, respectively. 
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Table 4: Numerical results for Example 4 with various levels. 



0.1 



0.2 



0.3 



0.4 



0.5 



0.6 



0.7 



0.8 



0.9 



A 2.25e2 2.23e2 2.23e2 2.26e2 2.27e2 2.27e2 2.32e2 2.34e2 2.29e2 
e 3.94e-2 3.97e-2 3.98e-2 4.05e-2 4.11e-2 4.14e-2 4.25e-2 4.31e-2 4.32e-2 



4.4 Transient Robin inverse problem 

This last example is adapted from [36]. Here we consider again Id transient heat transfer. Let Q be the 
spatial interval (0, 1), and the time interval be [0, 1]. The Id transient heat conduction is described by 

with a zero initial condition and the following boundary conditions 

dy dy 

—— — g(t) on Sc and - — \- uy = on S^, 

on on 

where u{t) is a time-dependent heat transfer coefficient, and the boundaries = = 0} x [0,T] and 
T^i = {x = 1} X [0,T]. The operator K maps the coefficient u to y restricted to Sc. The inverse problem 
is to estimate the coefficient u from noisy data y [36l [21]. For the inversion, the flux g is set to 1, 
and the true coefficient u = 1 -\- ^X[^,^] is discontinuous, where x denotes the characteristic function. 
The spatial and temporal intervals are both discretized into 100 uniform intervals. The operator K is 
discretized with piecewise linear finite elements in space and backward finite-difference in time. The 
number of measurements y is 101, and the the unknown u is of dimension 101. 

The numerical results for Example 4 with data of various noise levels are shown in Table [4j The 
accuracy e only deteriorates very mildly as the corruption percentage r increases from 0.1 to 0.9. The 
result for a typical realization of noisy data with r = 0.5 is shown in Fig. [5] The solution is not as 
accurate as before, since it oscillates slightly around the discontinuities of the true solution. This is 
a consequence of the smoothness prior adopted here, which in principle is unsuited to reconstructing 
discontinuous profiles. Nonetheless, the solution is reasonable as the overall profile of the true solution 
is largely retrieved, and the magnitude is accurate. The convergence of Algorithm [2] remains very stable 
for the discontinuous solution. However, there are several large plateaus, which might be pruned out by 
increasing the tolerance tol so as to effect the desired computational speedup. 



5 Concluding remarks 

In this paper we have developed a robust Bayesian approach to inverse problems subject to impulsive 
noises. It explicitly adopts a heavy-tailed t model to cope with data outliers, and it admits a scale 
mixture representation, which enables deriving efficient variational algorithms. The approach has been 
illustrated on several benchmark linear and nonlinear inverse problems arising in heat transfer. The 
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(a) mean by t model 

5, ^ ^ ^ ^ 




(c) weight w v.s. noise ^ 




(b) covariance by t model 




(d) convergence of Alg. 



Figure 5: Numerical results for Example 4 with r = 0.5 noise. In (c), the solid and dashed lines refer to 
the noise and the weight w, respectively. 
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numerical results are accurate and stable even in the presence of a fairly large amount of data outliers, 
and it is much more robust compared with the conventional Gaussian model. 

There are several avenues deserving further research. We have restricted our attention to the simplest 
Markov random field. A natural research problem would be the extension to general random fields, 
especially sparsity-promoting prior. Second, it is useful to develop alternative techniques, e.g., based on 
the Laplace model, and to compare their merits. The method can only achieve reasonable computational 
efficiency for medium-scale problems due to the variance component. It is of interest to develop scalable 
algorithms by imposing further restrictions on the approximation. Lastly, rigorous justification of the 
excellent performance of the model as well as the algorithm, e.g., consistency and convergence rate, is of 
immense theoretical importance, and is to be established. 
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